Problem Set 2 - Solutions

library(tidyverse)
library(shiny)
library(lubridate)
my_theme <- theme_bw() +
  theme(
    panel.background = element_rect(fill = "#f7f7f7"),
    panel.grid.minor = element_blank(),
    axis.ticks = element_blank(),
    plot.background = element_rect(fill = "transparent", colour = NA)
  )
theme_set(my_theme)

Interactive German Traffic

Scoring

  • a - b, Design (1 points): Creative and readable (1 point), generally appropriate but with some lack of critical attention (.5 points), difficult to read (0 points)

  • a - b, Code (0.5 points): Clear and concise (0.5 points), correct but unnecessarily complex (0.25 points), missing (0 points)

  • c, Design and Discussion (1 points): Creative question, solution, and interpretation (1 point), appropriate question, solution, and interpretation, but perhaps simplistic question / difficult to read design / underdeveloped interpretation (0.5 points), misleading design or no interpretation (0 points)

  • c, Code (0.5 points): Clear and concise (0.5 points), correct but unnecessarily complex (0.25 points), missing (0 points)

  • a - b, Design (0.5 points each part): Creative and readable design (0.5 points), generally appropriate but lacking critical attention (0.5 points), difficult to read (0 points).

  • a - b, Code (0.5 points each part): Clear and concise code (0.5 points), correct but unecessarily complex (0.25 points), missing (0 points).

  • c, Discussion (1 point): Creative and well-developed design proposal (1 point), appropriate but perhaps underdeveloped proposal (0.5 points), inappropriate or unclear proposal (0 points).

Question

This problem will revisit the previous problem from an interactive point of view. We will build a visualization that helps users explore daily traffic patterns across multiple German cities, using interactivity to help users navigate the collection. We will need additional features related to the day of the week for each timepoint, created by the wday function below,

traffic <- read_csv("https://uwmadison.box.com/shared/static/x0mp3rhhic78vufsxtgrwencchmghbdf.csv") %>%
 mutate(day_of_week = wday(date))

Example Solution

  1. Design and implement a Shiny app that allows users to visualize traffic over time across selected subsets of cities. Make sure that it is possible to view data from more than one city at a time. It is not necessary to label the cities within the associated figure.

    We first define a function that, when given a subset of cities, draws a line plot.

    plot_traffic <- function(df) {
      ggplot(df) +
        geom_line(aes(date, value, group = name)) +
        labs(x = "Date", y = "Traffic") +
        theme(axis.title = element_text(size = 20))
    }

    Our design will update a time series plot of all the cities every time a dropdown menu is updated. We will allow multiple cities to be selected simultaneously. Specifically, our UI has an input for choosing cities and displays the line plot as an output. Our server recognizes changes in the choice of cities, filters the data to that subset, and then draws the updated time series.

    ui <- fluidPage(
      selectInput("city", "City", unique(traffic$name), multiple = TRUE),
      plotOutput("time_series")
    )
    
    server <- function(input, output) {
      output$time_series <- renderPlot({
        traffic %>%
          filter(name %in% input$city) %>%
          plot_traffic()
      })
    }
    
    shinyApp(ui, server)
  1. Introduce new inputs to allow users to select a contiguous range of days of the week. For example, the user should have a way of zooming into the samples taken within the Monday - Wednesday range.

    We use nearly the same design except that a new slider input is provided for choosing days of the week. When a range of days is chosen, then the time series will show only that range for the currently selected cities.

    ui <- fluidPage(
      selectInput("city", "City", unique(traffic$name), multiple = TRUE),
      sliderInput("day_of_week", "Days", 2, 7, c(2, 7)),
      plotOutput("time_series")
    )
    
    server <- function(input, output) {
      output$time_series <- renderPlot({
        traffic %>%
          filter(
            name %in% input$city, 
            day_of_week >= input$day_of_week[1] & day_of_week <= input$day_of_week[2]
          ) %>%
          plot_traffic()
      })
    }
    
    shinyApp(ui, server)
  1. Propose, but do not implement, at least one alternative strategy for supporting user queries from either part (a) or (b). What are the tradeoffs between the different approaches in terms of visual effectiveness and implementation complexity?

    One alternative implementation could use a graphical query (instead of a dropdown menu) to select cities of interest. We could show the cities as a scatterplot on a map, with circle size reflecting their average weekly traffic level. By brushing cities on the map, we could update the currently displayed time series.

    This approach would be more effective for multiple cities that are geographically close by. It could also guide users towards cities that have higher vs. lower traffic levels, to see if they have any systematic differences. It would be less effective if the cities that we would want to compare with one another are not geographically close (in that case, we might imagine using another feature of the cities to guide the query).

    The implementation becomes somewhat more complex, because we would have to use a brush query and implement a map. If users are familiar with the locations of the cities, and if spatial queries are of genuine interest, then this additional cost might be worthwhile.

NYC Rentals

Scoring

  • a Design and code (0.5 points): Correct and polished static visualization (0.5 points), correct visualization but in need of refinement (0.25 points), missing or difficult to read visualization (0 points).
  • b Design and code (1 point): Clear and effective visual design and implementation (0.5 points), correct but not refined design or implementation (0.25 points), messy or unclear design or implementation (0 points).
  • c (1 point). Same criteria as part (b).
  • d Discussion (0.5 points): Correct and well-developed interpretation (0.5 points), correct but somewhat underdeveloped (0.25 points), missing or incorrect interpretations (0 points).

Question

In this problem, we’ll create a visualization to dynamically query a dataset of Airbnb rentals in Manhattan in 2019. The steps below guide you through the process of building this visualization.

Example Solution

  1. Make a scatterplot of locations (Longitude vs. Latitude) for all the rentals, colored in by room_type.

    The main logic for this figure is given in the ggplot and geom_point layers below. We use scale_color_manual to create a custom color scheme, a guide_legend to allow the legend points to stand out more clearly than the scatterplot points, and coord_fixed to keep longitude and latitude coordinates in proportion with one another.

    rentals <- read_csv("https://uwmadison.box.com/shared/static/zi72ugnpku714rbqo2og9tv2yib5xped.csv")
    ggplot(rentals) +
      geom_point(aes(longitude, latitude, col = room_type), size = 0.3, alpha = 0.6) +
      scale_color_manual(values = c("#3F4B8C","#F26444", "#40331D")) +
      guides(col = guide_legend(override.aes = list(alpha = 1, size = 2))) +
      labs(col = "Room Type") +
      coord_fixed() +
      theme_void()

  2. Design a plot and a dynamic query so that clicking or brushing on the plot updates the points that are highlighted in the scatterplot in (a). For example, you may query a histogram of prices to focus on neighborhoods that are more or less affordable.

    We will implement the suggested design, using a brushed histogram to highlight all the units within a specified price range. The map will always show all points, but their size and opacity will be updated to reflect the brush selection.

    ui <- fluidPage(
      h3("NYC Airbnb Rentals"),
      fluidRow(
        column(6,
               plotOutput("histogram", brush = brushOpts("plot_brush", direction = "x"), height = 200),
               dataTableOutput("table")
        ),
        column(6, plotOutput("map", height = 600)),
      ),
      theme = bs_theme(bootswatch = "minty")
    )
    
    server <- function(input, output) {
      selected <- reactiveVal(rep(TRUE, nrow(rentals)))
      observeEvent(input$plot_brush, {
        selected(brushedPoints(rentals, input$plot_brush, allRows = TRUE)$selected_)
      })
    
      output$histogram <- renderPlot(overlay_histogram(rentals, selected()))
      output$map <- renderPlot(scatterplot(rentals, selected()))
      output$table <- renderDataTable(filter_df(rentals, selected()))
    }
    
    shinyApp(ui, server)

    We have encapsulated the code for updating the plots into separate functions, printed below. We make sure that both the histogram and scatterplot highlight the currently selected range. This is the reason for using two geom_histogram layers in overlay_histogram – one layer is needed for the context and a second is used for the currently highlighted selection.

    scatterplot <- function(df, selected_) {
      df %>%
        mutate(selected = selected_) %>%
        ggplot() +
        geom_point(
          aes(
            longitude, latitude, col = room_type, 
            alpha = as.numeric(selected),
            size = as.numeric(selected)
          )
        ) +
        scale_color_manual(values = c("#3F4B8C","#F26444", "#40331D"), guide = "none") +
        scale_alpha(range = c(0.1, .5), guide = "none") +
        scale_size(range = c(0.1, .9), guide = "none") +
        coord_fixed() +
        theme_void()
    }
    
    overlay_histogram <- function(df, selected_) {
      sub_df <- filter(df, selected_)
      ggplot(df, aes(trunc_price, fill = room_type)) +
        geom_histogram(alpha = 0.3, binwidth = 25) +
        geom_histogram(data = sub_df, binwidth = 25) +
        scale_y_continuous(expand = c(0, 0, 0.1, 0)) +
        scale_fill_manual(values = c("#3F4B8C","#F26444", "#40331D")) +
        labs(
          fill = "Room Type",
          y = "Count",
          x = "Price"
        )
    }
    
    filter_df <- function(df, selected_) {
      filter(df, selected_) %>%
        select(name, price, neighbourhood, number_of_reviews) %>%
        rename(Name = name, Price = price, Neighborhood = neighbourhood, `Number of Reviews` = number_of_reviews)
    }

  1. Implement the reverse graphical query. That is, allow the user to update the plot in (b) by brushing over the scatterplot in (a).

    We can use almost exactly the same code as in the above app. The only difference is that we add a brush to our map. By keeping the brush IDs the same across the two plotOutputs, we ensure that the plot is updated whenever either brush is changed.

    ui <- fluidPage(
      h3("NYC Airbnb Rentals"),
      fluidRow(
        column(6,
               plotOutput("histogram", brush = brushOpts("plot_brush", direction = "x"), height = 200),
               dataTableOutput("table")
        ),
        column(6, plotOutput("map", brush = "plot_brush", height = 600)),
      ),
      theme = bs_theme(bootswatch = "minty")
    )
  1. Comment on the resulting visualization(s). If you had a friend who was interested in renting an Airbnb in NYC, what would you tell them?

    There are many ways you could interpret this visualization to guide the selection of rentals. If your friend were looking for more affordable units, you could direct them to units uptown, but with the caveat that they would be more likely to be shared (rather then a rental for the entire apartment).

Random Point Transitions

Scoring

  1. Code (1 point): Concise and effectively discussed implementation using .enter() and .append() (1 point), correct but complex or unjustified implementation (0.5 points), incorrect or poorly explained implementation (0 points).
  2. Code (1 point): Correct and concise extension of part (a) (1 point), technically correct but could be further refined or discussed (0.5 points), in correct or insufficiently discussed (0 points).
  3. Code and Design (1 point): Creative implementation that builds naturally from part (b) (1 point), appropriate implementation but could be refined (0.5 point), missing or poorly explained implementation (0 points).

Question

This exercise will give practice implementing transitions on simulated data. The code below generates a random set of 10 numbers,

let generator = d3.randomUniform();
let x = d3.range(10).map(generator);

Example Solution

  1. Encode the data in x using the x-coordinate positions of 10 circles.

    The D3 code for this encoding must bind the data and then set the cx attribute according to the current data value. Note that the radius and cy attributes did not have to be set here – since they are constant across all data elements, we put their values into the CSS file.

    let generator = d3.randomUniform();
    let x = d3.range(10).map(generator);
    
    d3.select("svg")
      .selectAll("circle")
      .data(x).enter()
      .append("circle")
      .attr("cx", d => 900 * d)

    We had used the following HTML and CSS, which are similar to all the examples used in class. They are just an empty SVG on a page that loads the required resources.

    HTML:

    <!DOCTYPE html>
    <html>
      <head>
        <script src="https://d3js.org/d3.v7.min.js"></script>
        <script src="https://d3js.org/d3-selection-multi.v1.min.js"></script>
        <link rel="stylesheet" href="q3.css">
      </head>
      <body>
        <svg height=500 width=900>
        </svg>
      </body>
      <script src="q3a.js"></script>
    </html>

    CSS:

    circle {
      cy: 250;
      r: 20
    }
  1. Animate the circles. Specifically, at fixed time intervals, generate a new set of 10 numbers, and smoothly transition the original set of circles to locations corresponding to these new numbers.

    We add the following lines to the javascript in part (a). This is creating a new x array and transitioning the points to a new cx based on the newly bound data. We create the animation by repeatedly calling update using d3.interval().

    function update() {
      x = d3.range(10).map(generator);
      d3.selectAll("circle")
        .data(x)
        .transition()
        .duration(1000)
        .attrs({
          cx: d => 900 * d,
        })
    }
    
    d3.interval(update, 1000)
  1. Extend your animation so that at least one other attribute is changed at each time step. For example, you may consider changing the color or the size of the circles. Make sure that transitions remain smooth (e.g., if transitioning size, gradually increase or decrease the circles’ radii).

    We modify our .attrs function above to set random radii and colors. Note that we had to remove r from the CSS in part (a) to ensure that it doesn’t overrule our D3-defined r attribute.

    .attrs({
      cx: d => 900 * d,
      r: d => 50 * generator(),
      fill: d => `hsl(${360 * generator()},${100 * generator()}%,${20 + 80 * generator()}%)`
    })

Bar Chart Transitions

Scoring

  • Code (2 points): Clear and concise code (0.5 points), correct but unnecessarily complex (0.25 points), missing (0 points).
  • Completeness (2 points): Implements all required functionality (2 points), implements most requirements (1 point), fails to implement key functionality (0 points).

Question

This problem continues [Simple Bar Chart] above. We will create a bar chart that adds and removes one bar each time a button is clicked. Specifically, the function below takes an initial array x and creates a new array that removes the first element and adds a new one to the end. Using D3’s generate update pattern, write a function that updates the visualization from [Simple bar chart] every time that update_data() is called. New bars should be entered from the left, exited from the right, and transitioned after each click. Your solution should look (roughly) like this example.

let bar_ages = [],
generator = d3.randomUniform(0, 500),
id = 0;

function update() {
  bar_ages = bar_ages.map(d => { return {id: d.id, age: d.age + 1, height: d.height }})
  bar_ages.push({age: 0, height: generator(), id: id});
  bar_ages = bar_ages.filter(d => d.age < 5)
  id += 1;
}

Example Solution

This is an exercise in using the general update pattern. On each update, we need to rebind the updated data array, making sure to associate each HTML tag with the .id attribute in each underlying array object. We enter and append a new rectangle at the left using .attrs({ x: 0, y: 500 }). Note that at this point, the rectangle has no height – we get the nice transition effect (bar rising to full height) by using the update in the following block. Since the Canvas SVG has its origin (0, 0) at the top left corner, and since the y coordinate of a rectangle also corresponds to the top left corner, we set the y and height values to,

...
y: d => 500 - d.height,
height: d => d.height
...

This ensures that y + height is 500, so that the bottom of each rectangle is always at y-coordinate 500.

HTML:

<!DOCTYPE html>
<html>
  <head>
    <script src="https://d3js.org/d3.v7.min.js"></script>
    <script src="https://d3js.org/d3-selection-multi.v1.min.js"></script>
  </head>
  <body>
    <button id="my_button" onclick="update()">Click</button>
    <svg height=500 width=900>
    </svg>
  </body>
  <script src="q4.js"></script>
</html>
let bar_ages = [],
generator = d3.randomUniform(0, 500),
id = 0;

function update() {
  bar_ages = bar_ages.map(d => { return {id: d.id, age: d.age + 1, height: d.height }})
  bar_ages.push({age: 0, height: generator(), id: id});
  bar_ages = bar_ages.filter(d => d.age < 5)
  id += 1;

  let selection = d3.select("svg")
    .selectAll("rect")
    .data(bar_ages, d => d.id)

  // Enter the new rectangle on the left
  selection.enter()
    .append("rect")
    .attrs({ x: 0, y: 500 })

  // Update all heights and locations
  d3.select("svg")
    .selectAll("rect")
    .transition()
    .duration(1000)
    .attrs({
      x: d => (900 / 5) * d.age,
      y: d => 500 - d.height,
      height: d => d.height,
      width: 100
    })

  // Exit the old rectangle on the right
  selection.exit()
    .transition()
    .duration(1000)
    .attrs({ y: 500 height: 0})
    .remove()
}

Transition Taxonomy

Scoring

  • a, (1 point): Correct and clearly explained choice of transition type (1 point), correct choice of transition type but with less convincing justification (0.5 points), inappropriate choice of transition (0 points).
  • b, (1 point): Correct identification of all component SVG types (1 point), identification of most but not all SVG types (0.5 points), incorrect analysis of SVG component types (0 points).
  • c, (1 point): Complete and correct deconstruction of graphical transitions (1 point), generally correct but underdeveloped deconstruction (0.5 points), incorrect or vague discussion of transitions (0 points).

Question

In “Animated Transitions in Statistical Graphics,” Heer and Robertson introduce a taxonomy of visualizations transitions. These include,

  • View Transformation: We can move the “camera view” associated with a fixed visualization. This includes panning and zooming, for example.
  • Filtering: These transitions remove elements based on a user selection. For example, we may smoothly remove points in a scatterplot based on a dropdown menu selection.
  • Substrate Transformation: This changes the background context on which points lie. For example, we may choose to rescale the axis in a scatterplot to show a larger range.
  • Ordering: These transitions change the ordering of an ordinal variable. For example, we may transition between sorting rows of a heatmap alphabetically vs. by their row average.
  • Timestep: These transitions smoothly vary one plot to the corresponding plot at a different timestep. For example, we might show “slide” a time series to the left to introduce data for the most recent year.
  • Visualization Change: We may change the visual encoding used for a fixed dataset. For example, we may smoothly transition from a bar chart to a pie chart.
  • Data Scheme Change: This changes the features that are displayed. For example, we may smoothly turn a 1D point plot into a 2D scatterplot by introducing a new variable.

In this problem, we will explore how these transitions arise in practice and explore how they may be implemented.

Example Solution

  1. Pick any visualization from the New York Times Upshot, Washington Post Visual Stories, the BBC Interactives and Graphics, or the Guardian Interactives pages. Describe two transitions that it implements. Of the 7 transition types given above, which is each one most similar to? Explain your choice.

We will analyze the interactive map and horizon plots from Mapping the Spread of Drought Across the U.S.. Two transitions implemented in this report are,

  • Map transitions: When the reader drags a slider, they are able to see changes in drought severity across the continental US. This exactly falls into the “Timestep” type of transition, because it shows the same view (the drought map) but with different underlying data as the sider is changed.
    • Horizon plot details: When the user places a mouse on the horizon plot, it shows the exact percentage associated with that timepoint. This change is more ambiguous, but it could be considered a visualization change, because the underlying data have not changed, but the marks encoding them have. Specifically, the encoding had previously only included the colors and position on the horizon plot, but after interaction, it provides an additional bar and text overlay to encode the same information.
  1. For any transition (which may or may not be one of those you chose in (a)), identify the types of graphical marks used to represent the data. How would you create this type of mark in SVG?

For the horizon plot interaction, we could use an SVG <rect> to draw the bar representing the currently howevered timepoint. The height of this SVG would reflect the severity of the drought at the current timepoint. There is also an HTML text element that is moved on each interaction, giving the tooltip representing the current bar’s height.

  1. To achieve the transition effect, how do you expect that the SVG elements would be modified / added / removed? Specifically, if elements are modified, what SVG attrs would be changed, and if elements are added or removed, how would the enter-exit-update pattern apply? You do not need to look at the code implementing the actual visualization, but you should give a plausible description of how the transition could be implemented in D3.

For the bar representing the hovered position, we would have to transition both the \(x\)-axis location and the height of the bar. There do not need to be any entrances or exits of HTML elements, since we are only ever showing one bar at a time. Roughly, we would bind data on the drought severity at the currently hovered year and then update the position of the bar and text that give details, like in the following pseudocode,

```r
d3.select("#focus_bar")
  .data(current_year)
  .attrs({
    height: d => scales.drought(d.drought),
    y: d => baseline - scales.drought(d.drought),
    x: d => scales.year(d.year)
  })
```

assuming that the object `scales` defines two linear scales mapping drought
severity to bar height and selected year to $x$ position, respectively.

Icelandic Population Analysis

Scoring

  • a, Discussion (1 point): Clear interpretation and takeaways (1 point), missing some important aspects of interpretation (0.5 points), incorrect or underdeveloped explanations (0 points).
  • b, Discussion (1 point): Accurate and complete explanation of the filtering and ID implementations, as well as their role in the transition (1.5 points), accurate but potentially incomplete explanations of these steps or their role (0.75 points), inaccurate or underdeveloped explanation (0 points).
  • c, Discussion (1 point): Correct and specific explanation of the transition effect (1 point), generally correct explanation but lacking important details (0.5 points), incorrect explanation (0 points).
  • d, Design and discussion (1 point): Creative and effective visual design proposal (1 point), appropriate but less well-developed deisgn proposal (0.5 points), ineffective or vaguely communicated proposal (0 points).

Question

In this problem, we will analyze the design and implementation of this interactive visualization of Iceland’s population.

Example Solution

  1. Explain how to read this visualization. What are two potential insights a reader could takeaway from this visualization?

    For reading the visualization,

    • At a fixed timepoint, the visualization describes the population age distribution. It shows how much of the population is in different age ranges. It also shows which age groups have a surplus of one vs. another gender.
    • When animated, it shows how the overall population as well as the age distribution has shifted over time. By changing the selected year, it allows comparison of the age distribution and gender surplus between different timepoints.

    Example takeaways,

    • The age distribution has shifted towards an older population as time progresses.
    • During the 19th century, there tended to be more women than men, but this has reversed in the 20th century.
    • A rapid increase in the number of children born right after 1950 is visible. There also appears to be an increase in the number of 30 - 40 year olds in the 2000’s and 2010’s, which might be a result of immigration into the country.
  2. The implementation uses the following data join,

    rect = rect
      .data(data.filter(d => d.year === year), d => `${d.sex}:${d.year - d.age}`)

    What does this code do? What purpose does it serve within the larger visualization?

    This binds the dataset filtered to the currently selected year. It is used in the general update pattern for entering, updating, and exiting bars. The entered bars correspond to a newly born cohort. The updates both age each cohort (shift the bars to the left) and change the number of individuals in them (update the heights of the rectangles). The exits remove the cohorts after they have passed age 105.

    The second part of the command above, d => \\({d.sex}:\){d.year - d.age}`)is an ID function. It maps each rectangle to a gender and generational cohort (note thatyear - age` is the year of birth) combination. If it didn’t include this ID, then the heights of the bars would change (reflecting the change in the population distribution), but we wouldn’t see the transitioning of one cohort from one age group to the next.

  3. When the bars are entered at Age = 0, they seem to “pop up,” rather than simply being appended to the end of the bar chart. How is this effect implemented?

    When the bars are entered, their height is set to be zero. It is only during the following block that the height attribute is set to the number of individuals in each cohort. Since the change is implemented with an intermediate transition rect.transition().attr(..., we are able to have a smooth “pop up” effect for this first bar.

  4. Suppose that you had comparable population-by-age data for two countries. What queries would be interesting to support? How would you generalize the current visualization’s design to support those queries?

    Some example queries that we might be interested in are,

    • At a given point in time, how to the age distributions compare to one another? Which country has a larger fraction of older or younger individuals?
    • Do the age distributions shift differently between the two countries? For example, if within a certain window of years, there is a rapid influx of immigrants within a certain age group for country A, is there a similar shift in that age group for country B?

    An example design that could address these queries is to turn the current visualization on its side, so that births enter from the bottom and deaths exit near the top. Then, we can place bars for the two countries facing out around a central axis. We could continue using the same animation, and by comparing the shapes on the left and right hand sides of the display, we would be able to see differences in the population age structure.